Unsupervised Word Alignment Using Frequency Constraint in Posterior Regularized EM

نویسندگان

Hidetaka Kamigaito

Taro Watanabe

Hiroya Takamura

Manabu Okumura

چکیده

Generative word alignment models, such as IBM Models, are restricted to oneto-many alignment, and cannot explicitly represent many-to-many relationships in a bilingual text. The problem is partially solved either by introducing heuristics or by agreement constraints such that two directional word alignments agree with each other. In this paper, we focus on the posterior regularization framework (Ganchev et al., 2010) that can force two directional word alignment models to agree with each other during training, and propose new constraints that can take into account the difference between function words and content words. Experimental results on French-to-English and Japanese-to-English alignment tasks show statistically significant gains over the previous posterior regularization baseline. We also observed gains in Japanese-toEnglish translation tasks, which prove the effectiveness of our methods under grammatically different language pairs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large-scale Word Alignment Using Soft Dependency Cohesion Constraints

Dependency cohesion refers to the observation that phrases dominated by disjoint dependency subtrees in the source language generally do not overlap in the target language. It has been verified to be a useful constraint for word alignment. However, previous work either treats this as a hard constraint or uses it as a feature in discriminative models, which is ineffective for large-scale tasks. ...

متن کامل

Unsupervised Word Alignment by Agreement Under ITG Constraint

We propose a novel unsupervised word alignment method that uses a constraint based on Inversion Transduction Grammar (ITG) parse trees to jointly unify two directional models. Previous agreement methods are not helpful for locating alignments with long distances because they do not use any syntactic structures. In contrast, the proposed method symmetrizes alignments in consideration of their st...

متن کامل

Online EM for Unsupervised Models

The (batch) EM algorithm plays an important role in unsupervised induction, but it sometimes suffers from slow convergence. In this paper, we show that online variants (1) provide significant speedups and (2) can even find better solutions than those found by batch EM. We support these findings on four unsupervised tasks: part-of-speech tagging, document classification, word segmentation, and w...

متن کامل

A Framework for Tuning Posterior Entropy in Unsupervised Learning

We present a general framework for unsupervised and semi-supervised learning containing a graded spectrum of Expectation Maximization (EM) algorithms. We call our framework Unified Expectation Maximization (UEM.) UEM allows us to tune the entropy of the inferred posterior distribution during the E-step to impact the quality of learning. Furthermore, UEM covers existing algorithms like standard ...

متن کامل

Feature-Based ITG for Unsupervised Word Alignment

3 Department of Computer Science, School of Computing, National University of Singapore Abstract. Inversion transduction grammar (ITG) [1] is an effective constraint to word alignment search space. However, the traditional unsupervised ITG word alignment model is incapable of utilizing rich features. In this paper, we propose a novel feature-based unsupervised ITG word alignment model. With the...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Unsupervised Word Alignment Using Frequency Constraint in Posterior Regularized EM

نویسندگان

چکیده

منابع مشابه

Large-scale Word Alignment Using Soft Dependency Cohesion Constraints

Unsupervised Word Alignment by Agreement Under ITG Constraint

Online EM for Unsupervised Models

A Framework for Tuning Posterior Entropy in Unsupervised Learning

Feature-Based ITG for Unsupervised Word Alignment

عنوان ژورنال:

اشتراک گذاری